-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added packet count telemetry metrics #2797
Conversation
✅ Deploy Preview for brilliant-pasca-3e80ec ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This comment was marked as resolved.
This comment was marked as resolved.
17f5214
to
d4bb021
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Please don't forget to look at the new dashboard additions as well please: https://rafikitelemetry.grafana.net/d/cdq2hn0w6frpcr/test-packet-count?orgId=1 |
packages/backend/src/open_payments/payment/incoming/service.test.ts
Outdated
Show resolved
Hide resolved
packages/backend/src/payment-method/ilp/connector/core/middleware/ilp-packet.ts
Outdated
Show resolved
Hide resolved
Regarding the
I think probably just the prepare amount? Currently we are summing fulfill/reject/prepare but the prepare include fulfill and reject already it looks like. So I think we can just remove the last one here (and related graphs): |
packages/backend/src/payment-method/ilp/connector/core/middleware/ilp-packet.ts
Outdated
Show resolved
Hide resolved
packages/backend/src/payment-method/ilp/connector/core/middleware/ilp-packet.ts
Outdated
Show resolved
Hide resolved
How will the existing visualizaitons be impacted? It looks like maybe the new |
I don't have strong feelings about whichever approach. I know Max was initially hoping we'd collect this to get a handle on packet loss through the network but this tricky when telemetry is optional. Given that I assumed we would still be most interested in 1. the total actual number of packets moving through the network (prepare + reject + fulfill) and 2. the relationship between them (we expect prepare = fulfill + reject) and maybe good to see how many rejects we get vs fulfills as well. Of course some of the rejects are quote based as well. I guess I'm not exactly sure. Maybe we can just graph fulfill and reject and leave the prepare and totals out? |
Good question. This felt obvious when I started but less so now. |
1803725
to
6e29243
Compare
|
||
While `handleIlpData` is an effective location for telemetry collection, it has some limitations: | ||
|
||
Sender-Side Limitation: Currently, metrics are only collected on the sender side. This is adequate for now but does not capture data from connecting nodes, which we plan to address in future implementations when multi-hop support is added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add a short explanation for the connector nodes: eg in A > B > C, we only collect packet information in A
- packet_count_reject: Counts the reject packets received. | ||
- packet_amount_fulfill: Records the amount sent in fulfill packets. | ||
|
||
These metrics provide valuable insights, including potential packet loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this, and this is not exactly packet loss like it is defined in the networking sense (since we only count them in the originating connector) , but something like rate of success/error instead
|
||
### Challenges with the Current Setup | ||
|
||
While `handleIlpData` is an effective location for telemetry collection, it has some limitations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not having function names in documentation since it becomes outdated quite quickly, I think we can just remove references to it and keep just the general explanation which you have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I've included function names is try and avoid any duplicated work when the next person needs to update this or change the metric location. Is there somewhere better for me to use function names and get really specific outside of the README?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a couple of comments.
packages/backend/src/open_payments/payment/outgoing/service.test.ts
Outdated
Show resolved
Hide resolved
packages/backend/src/payment-method/ilp/connector/core/telemetry.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, minor comments (main one about some tests)
packages/backend/src/payment-method/ilp/connector/core/rafiki.ts
Outdated
Show resolved
Hide resolved
@@ -1161,6 +1161,85 @@ describe('OutgoingPaymentService', (): void => { | |||
return payment | |||
} | |||
|
|||
test('Telemetry Transaction Counter increments for COMPLETED transactions', async (): Promise<void> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for adding this
packages/backend/src/payment-method/ilp/connector/core/README.md
Outdated
Show resolved
Hide resolved
packages/backend/src/payment-method/ilp/connector/core/README.md
Outdated
Show resolved
Hide resolved
import { IlpFulfillFactory, IlpRejectFactory } from '../factories' | ||
import { ValueType } from '@opentelemetry/api' | ||
|
||
describe('Connector Core Telemetry', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a few tests for handling non-positive cases (e.g., "... doesn't collect metrics for unfulfillable packets")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should I just pass in unfulfillable
as true
and check it doesn't call incrementCounter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did my best, have a look and see if the new tests meet this criteria :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks good, thanks!
Co-authored-by: Max Kurapov <[email protected]>
Co-authored-by: Max Kurapov <[email protected]>
…ger/rafiki into 2734/sj/telemetry-count-packets
@JoblersTune just need to fix the build and good to go |
This PR has been waiting a while, so I nearly forgot there was a doc change here too. I'll just paste the text here for your reference and remove the file changes in this PR (telemetry overview page)
|
@JoblersTune Thank you! I'll get the updates worked in to our other branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Changes proposed in this pull request
packet_amount_fulfill
and included source metadata with the amount metric collection.Moved the collection of all packet data over to the receiving node so that it can log a consistent match of incoming prepares to outgoing responses. This way, since telemetry is optional, we'll get a coherent reporting structure since one node can send coherent results in isolation.handleIlpData
function which is called only on sending nodes, and gives us access to whether packets are unfulfillable so we don't need to count quote packets.Cleaned up old telemetry code that is no longer needed after changes were madeOutside of this PR but related:
Context
Working towards expanding telemetry for telemetry v2
TODO:
Checklist
fixes #number